Universal Access To All Knowledge
Home Donate | Store | Blog | FAQ | Jobs | Volunteer Positions | Contact | Bios | Forums | Projects | Terms, Privacy, & Copyright
Search: Advanced Search
Anonymous User (login or join us)
Upload

Reply to this post | See parent post | Go Back
View Post [edit]

Poster: Victor3 Date: May 16, 2009 3:51pm
Forum: texts Subject: Re: How to correct a title already uploaded?

I second this. Often I come accross titles where the volume number is missing. Therefore if you want all volumes of a three volume work, you have to open very many texts before you find them. One example: http://www.archive.org/details/lehrundhandbuch00thangoog . Perhaps here the title was longer than the title field allowed. Thus one would need one field for a short title and one field for the complete title.

Suggestions:

* Offer more fields while uploading: title, author, year. Make clear whether in the first title field offered there should be the whole title or only a short title.

* Offer users the possibility to make changes. A few days ago there was the possibility to suggest changes. That has gone now.

User tpb uploaded a host of wonderful texts. Thank you! But for many of them the data are incomplete. Sometimes the title is incomplete, the volume number is missing, or the author name is missing. Example: http://www.archive.org/details/modernephilosop00unkngoog .
How can I complete the data of such entries?



This post was modified by Victor3 on 2009-05-16 22:51:07

Reply to this post
Reply [edit]

Poster: Administrator, Curator, or Staffhank_b Date: May 27, 2009 4:04pm
Forum: texts Subject: Re: How to correct a title already uploaded?

Victor3 wrote:

User tpb uploaded a host of wonderful texts. Thank you! But for many of them the data are incomplete. Sometimes the title is incomplete, the volume number is missing, or the author name is missing. Example: http://www.archive.org/details/modernephilosop00unkngoog .

========

We are in the midst of a clean-up pass over *all* the public-domain Google books that have been contributed to the Archive, some of which, as you note, arrived with incomplete metadata. Take another look at modernephilosop00unkngoog: it was revisited on May 21, at which point info on author and contributing library was found and added to our copy.

In the past week we've added author info to 25,000 of the Google books for which we had none, and contributor info to 15,000 Google books that lacked it. The number of our Google books with no title given has dropped from 8400 to ~100.

There are no doubt many remaining problems in a collection this size, but bit by bit, we are working to make improvements.

Hank Bromley
software engineer
Internet Archive

Reply to this post
Reply [edit]

Poster: Victor3 Date: May 28, 2009 5:57am
Forum: texts Subject: Re: How to correct a title already uploaded?

Thank you!
Desirable would be further:
* That the YEAR appears in an extra field or in the title or the description, and

* That the VOLUME NUMBER is added to the title (For example, search for:
title:(mikrokosmus) AND creator:(lotze)
15 items are found, but one cannot see which is which of three volumes.)

* that the complete title with SUBTITLE appears in the description (or in an extra field for the subtitle).

Reply to this post
Reply [edit]

Poster: Administrator, Curator, or Staffhank_b Date: May 28, 2009 5:28pm
Forum: texts Subject: Re: How to correct a title already uploaded?

If you use the advanced search features (http://www.archive.org/advancedsearch.php, scroll down to "Advanced XML Search"), you can control which fields are included in the search results. For instance, regarding the series that you mention, you could do:

http://www.archive.org/advancedsearch.php?q=title%3Amikrokosmus+AND+creator%3Alotze+AND+mediatype%3Atexts&;fl[]=creator&fl[]=identifier&fl[]=title&fl[]=year&sort[]=year+asc&rows=50&save=yes&fmt=tables&xmlsearch=Search

That will show you the title and year of publication for each item in the search results, and sort the results by year. This example shows the results as an html table, but you can also specify that the results be returned as XML or JSON, enabling automated further processing of the results if you're a programmer.

You could also specify that the volume number be returned with the results, but the problem there is that we have volume numbers for hardly any of the Google books (at the moment, only about 15 that I added manually after an Archive user noted the correct volumes in reviews of the items). Initially, Google was not posting volume info on their page for each book, which left us no way of knowing what volume each one was. But I now see that they've begun posting that information for at least some books - I will look into adding a check for volume info to our metadata clean-up passes.

This post was modified by hank_b on 2009-05-29 00:28:34

Reply to this post
Reply [edit]

Poster: bookdev Date: Oct 6, 2009 5:17pm
Forum: texts Subject: Re: How to download all titles

Could someone please be so kind as to suggest how to download all text titles/contributors, if you don't mind? I have tried using the advanced search in CSV, XML and JSON formats. But each of them dies after one or two hundred thousand titles (depending on the format chosen). I have also tried to break the database down into small sets using ranges but haven't found a criteria that will do that. Date ranges, for example, seem to produce unexpected results for me.

Reply to this post
Reply [edit]

Poster: Time Traveller Date: Oct 6, 2009 7:01pm
Forum: texts Subject: Re: How to download all titles

you are taking about 1.5 million items, you will never be able to read all the title details in your life time certainly not the full texts.

Just why do you want to download all that, seeing that stuff is just as accessible when left on the Archive.

You are talking about having it on your PC, so what will happen when 2 hours later, you database is out of date, seeing uploads are happening every minute 24/7

Reply to this post
Reply [edit]

Poster: stbalbach Date: Oct 6, 2009 8:48pm
Forum: texts Subject: Re: How to download all titles

I've never tried it: this post describes how. But it sounds like you've already seen it, I'm linking to it in case not. It is odd that advanced search bombs out after a limit.

Stephen

Reply to this post
Reply [edit]

Poster: bookdev Date: Oct 7, 2009 12:32pm
Forum: texts Subject: Re: How to download all titles

Thanks for the link. Unfortunately, the method described only works on small exports. And when I try to reduce the size of the export by adding "<" or ">" I usually get the error:
"Search engine returned invalid information or was unresponsive. We are working to resolve this issue."

Reply to this post
Reply [edit]

Poster: stbalbach Date: Oct 7, 2009 2:00pm
Forum: texts Subject: Re: How to download all titles

That's usually the error I get when I made a syntax error in the search string. Not saying that is the case here, but sometimes the search strings can be very complex. If you want to post a sample one here I'd be happy to take a second look.

Reply to this post
Reply [edit]

Poster: Victor3 Date: May 30, 2009 7:22am
Forum: texts Subject: Missing volume numbers; long subtitles

* I have now also started to write missing volume numbers in reviews.

* I now do not write long subtitles in the "title" field because the longest which is shown is (here)
"Mikrokosmus: Ideen zur Naturgeschichte und Geschichte der Menschheit" while the complete title is
"Mikrokosmus 3; Ideen zur Naturgeschichte und Geschichte der Menschheit; Versuch einer Anthropologie, 3 Bände; Dritter Band: 7. Die Geschichte, 8. Der Fortschritt, 9. Der Zusammenhang der Dinge"
Unless you advise me otherwise, I write complete titles including long subtitles in the description field.

This post was modified by Victor3 on 2009-05-30 14:22:30

Reply to this post
Reply [edit]

Poster: Administrator, Curator, or Staffhank_b Date: May 30, 2009 4:19pm
Forum: texts Subject: Re: Missing volume numbers; long subtitles

If you see Google books that are missing volume info, and the identifier begins with a letter between 'm' and 'z', you might want to wait a few days before posting volume info in the review. The clean-up pass is proceeding alphabetically and is now in the m's; it should reach the end of the alphabet sometime this coming week. In fact, it visited mikrokosmusidee08lotzgoog today, about half an hour after you did, and added the same volume info you had put in your review.

We're now up to 32,089 Google books with volume info (being added at about 13,000/day).

Reply to this post
Reply [edit]

Poster: Victor3 Date: May 28, 2009 4:33pm
Forum: texts Subject: Re: How to correct a title already uploaded?

Ah, I had overlooked this possibility!
* But if the year field for these books is complete, why does this information not show up in the normal search
http://www.archive.org/search.php?query=title%3A(mikrokosmus)%20AND%20creator%3A(lotze) ? That would be very useful!

* The presentation of the search results as HTML table does not work in my Firefox 3.0.10 (nor in IE). The data are not shown at all. Can you tell me why? The XML format works.

* The volume information could go into the title or into the volume field, or both. Both would work equally well for me.

Reply to this post
Reply [edit]

Poster: Administrator, Curator, or Staffhank_b Date: May 28, 2009 5:38pm
Forum: texts Subject: Re: How to correct a title already uploaded?

I'm also using Firefox 3.0.10 (on a Mac). I see the link to my search query didn't quite make it through the forum system intact, so maybe that's part of the problem you had. Let's try this: http://preview.tinyurl.com/kmydn5 . The search engine is sometimes a little slow to respond - if you see something like "waiting for homeserver7.us.archive.org" in the status bar, be patient.

The basic search only displays a certain fixed set of fields; no matter how we tweak that set, something that someone wants to see included will be left out. Thus the advanced search: you get to choose which fields you want to see.

I've implemented volume-seeking, and volume info is now being added as we process (or reprocess) Google books. So far we've climbed from 15 Google books with volume info to 30.

Reply to this post
Reply [edit]

Poster: Victor3 Date: May 28, 2009 9:10pm
Forum: texts Subject: Re: How to correct a title already uploaded?

* Thank you, now (at another computer) it works. Great, I should have tried it before!
I just saw that now the year always appears in the detail window of every book, in brackets behind the title. If I am not mistaken, that was not so a few weeks ago.

* I just realised how many new texts there are in Google in my field of interest (philosophy in German). It is a lot of work to download and upload them. I hope others or the team is working on this too and have a more efficient way of doing it.

Reply to this post
Reply [edit]

Poster: Administrator, Curator, or Staffhank_b Date: May 28, 2009 9:43pm
Forum: texts Subject: Re: How to correct a title already uploaded?

What appears in parens after the title on the details page is actually the "date" field, not the "year." The two values are often the same, but not always. "Year" is when the particular item was published; "date" pertains to the series the item belongs to.

So for periodicals, "date" is when the series began (or sometimes the range of years the series covered), while "year" will be when a single issue was published. See, for instance, http://www.archive.org/details/ntsiklopediches01andrgoog .

Books usually have a single publication date, used for both "date" and "year", although that, too, can be complicated by multiple editions, translations, etc.